DATA2002 • Semester 2 • Manav Khemchandani
The Price of a Dream
Exploring what drives the cost of a dream house in the city of dreams.
Focus: Do bigger houses and lots actually mean higher prices?
Focus: Does having more amenities or newer construction raise house prices?
Focus: Do neighbourhood and environmental factors matter?
Focus: Do older homes lose value, and does heating/cooling system influence price?
:::::
:::
Box plot on amenities
Linear Regression results:
\(Price = 42113.84 + 89371.81Bathroom\)
\(Price = 171820.23 + 66743.97Fireplace\)
Box plot on Number of Rooms and Newly Constructed or not
Linear Regression results:
\(Price = 52624.99 + 22616.82Room\)
\(Price = 208580.78 + 73726.04NewConstruct\)
Multivariable linear regression: How does the price differ when all variables are combined?
\(Predicted Price = -240.6 + 59749.4Bathrooms + 19561.7Fireplaces + 12322.5Rooms + 663.1 NewlyConstruct\)
Log-linear regression: Are there any outliers?
\(Multivariable R^2 = 0.4341173\)
\(LogLinear R^2 = 0.4192378\)
T-tests show a significant waterfront premium
Regression confirms Pct.College is a strong positive predictor of price
ANOVA shows Public sewer systems linked to higher prices
Combined regression: all three remain independent and significant
Bottom Summary: Neighbourhood quality and infrastructure are systematically capitalised into housing prices.
Figure 1: Correlation Matrix of each pair of variables
Figure 2: Assumptions plots before and after log transformation
Log-Adjusted Model: R-squared: ~\(0.49\), Adjusted R-Squared: ~\(0.49\)
\(\text{Predicted Price}=11.2+4.01\times 10^{-04}\cdot \text{Living.Area}+0.14\cdot \text{Bathrooms}\)
Simple Comparison Model: R-squared: ~\(0.47\), Adjusted R-Squared: ~\(0.47\)
\(\text{Predicted Price}=11.28+5.06\times 10^{-04}\cdot \text{Living.Area}\)
Interaction Model: R-squared: ~\(0.49\), Adjusted R-Squared: ~\(0.49\)
\(\text{Predicted Price}=11.08+4.77\times 10^{-04}\cdot \text{Living.Area}+0.2\cdot \text{Bathrooms}-3.40\times 10^{-05}\cdot \text{Living.Area:Bathrooms}\)
Figure 3: Interactions Summary
Log-Adjusted VIF:
Living.Area Bathrooms
2.067436 2.067436
\(R^2=0.5905676\)
\(Predicted Price = 1346.1 + 69205.5 Living.Area + 9133.5 Land Value + 27682.6 Bathrooms + 123373.7 Waterfront + 7369.6 Lot.Size\)
\(MultiVarR^2=0.6354036\)
\(LogLinearR^2=0.5627405\)
DATA2002 • Semester 2 • GROUP L21G03